Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

نویسندگان

چکیده

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have program function and, hence, there is opportunity make visible -- show function's code agent so it can exploit internal structure learn policies a more sample efficient manner. this paper, we how accomplish idea two steps. First, propose machines, type of finite state machine that supports specification while exposing structure. We then describe different methodologies support learning, including automated shaping, task decomposition, counterfactual reasoning off-policy learning. Experiments on tabular continuous domains, across tasks agents, benefits exploiting respect efficiency quality resultant Finally, by virtue being form machine, machines expressive power regular language such loops, sequences conditionals, well expression temporally extended properties typical linear temporal logic non-Markovian specification.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reward, Motivation, and Reinforcement Learning

There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly ...

متن کامل

Compatible Reward Inverse Reinforcement Learning

PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...

متن کامل

An Average - Reward Reinforcement Learning

Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish ...

متن کامل

Hierarchical Average Reward Reinforcement Learning

Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representati...

متن کامل

Maximum reward reinforcement learning: A non-cumulative reward criterion

Existing reinforcement learning paradigms proposed in the literature are guided by two performance criteria; namely: the expected cumulativereward, and the average reward criteria. Both of these criteria assume an inherently present cumulative or additivity of the rewards. However, such inherent cumulative of the rewards is not a definite necessity in some contexts. Two possible scenarios are p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Artificial Intelligence Research

سال: 2022

ISSN: ['1076-9757', '1943-5037']

DOI: https://doi.org/10.1613/jair.1.12440